7 research outputs found
Quantized Densely Connected U-Nets for Efficient Landmark Localization
In this paper, we propose quantized densely connected U-Nets for efficient
visual landmark localization. The idea is that features of the same semantic
meanings are globally reused across the stacked U-Nets. This dense connectivity
largely improves the information flow, yielding improved localization accuracy.
However, a vanilla dense design would suffer from critical efficiency issue in
both training and testing. To solve this problem, we first propose order-K
dense connectivity to trim off long-distance shortcuts; then, we use a
memory-efficient implementation to significantly boost the training efficiency
and investigate an iterative refinement that may slice the model size in half.
Finally, to reduce the memory consumption and high precision operations both in
training and testing, we further quantize weights, inputs, and gradients of our
localization network to low bit-width numbers. We validate our approach in two
tasks: human pose estimation and face alignment. The results show that our
approach achieves state-of-the-art localization accuracy, but using ~70% fewer
parameters, ~98% less model size and saving ~75% training memory compared with
other benchmark localizers. The code is available at
https://github.com/zhiqiangdon/CU-Net.Comment: ECCV201
Learning to Detect and Track Visible and Occluded Body Joints in a Virtual World
Multi-People Tracking in an open-world setting requires a special effort in precise detection. Moreover, temporal continuity in the detection phase gains more importance when scene cluttering introduces the challenging problems of occluded targets. For the purpose, we propose a deep network architecture that jointly extracts people body parts and associates them across short temporal spans. Our model explicitly deals with occluded body parts, by hallucinating plausible solutions of not visible joints. We propose a new end-to-end architecture composed by four branches (visible heatmaps, occluded heatmaps, part affinity fields and temporal affinity fields) fed by a time linker feature extractor. To overcome the lack of surveillance data with tracking, body part and occlusion annotations we created the vastest Computer Graphics dataset for people tracking in urban scenarios by exploiting a photorealistic videogame. It is up to now the vastest dataset (about 500.000 frames, almost 10 million body poses) of human body parts for people tracking in urban scenarios. Our architecture trained on virtual data exhibits good generalization capabilities also on public real tracking benchmarks, when image resolution and sharpness are high enough, producing reliable tracklets useful for further batch data association or re-id modules